Data Extraction Intern - Summer

Location: United States

Type: Internship

Min. Experience: Student (College)

DATA EXTRACTION ENGINEER - Summer

 

Quad Analytix is an exciting, early stage data startup headquartered in the Bay Area-CA, focused on harnessing e-commerce information gathered from a variety of sources to normalize, analyze and create insights that are visualized by our customers to enable them to do their job more effectively. 

 

If you love data, want an opportunity to learn and grow with our company, and are excited about your work directly impacting the entire company, Quad is a great fit for you.  We have an exciting environment with smart, down-to-earth people that enjoy having fun while working together.

 

JOB DESCRIPTION:

We are looking for someone who is detail oriented and is passionate about the quality of their work, someone who can help advance our platform of acquiring, extracting semantic meaning and structuring data from the internet - be it the "classic" web or the social "web". We want you to be excited about constantly experimenting with new approaches while being dedicated to creating scalable, operationally friendly production systems.  

 

As a Data Extraction Intern you will work with the product, development, and operations teams to understand requirements, articulate an approach, and design and execute both working prototypes and production systems. Your main responsibilities will include:

  • Developing tools to extract information from the web, email and social media.
  • Continually test different approaches to best meet the data extraction needs
  • Design and develop extractors to be performance efficient and robust
  • Execute test and QA processes to ensure high quality results

 

DESIRED SKILLS AND QUALIFICATIONS:                                 

  • Work Experience of 0-2 years
  • Bachelors in Computer Science
  • Knowledge of Python and Node JS
  • Experience in working on web extraction libraries
  • Good knowledge of network programming
  • Understanding of web technologies like HTML, CSS, Java Script and Regular expression
  • Understanding of browser internals (i.e. webkit, Gecko etc.), experience in developing browser extension is a plus.

Desirable:

  • Understanding of Selenium, PhantomJS and similar UI automation test tools.
  • Experience with Information Extraction using software tools like Kapow, Connotate
  • Working understanding of databases, REST/Web services and different data formats like XML, JSON
  • Experience with Java and Big Data technologies like Hadoop, MongoDB.
  • Have worked with companies working on web analytics and semantic data extraction at scale using distributed crawler infrastructure. Understanding the context under which distributed, well-behaved crawlers work is also desirable - i.e. knowledge of robots.txt, user-agents, Captcha/Recaptcha algorithms and Intrusion detection/prevention systems.

 

OTHER CHARACTERISTICS:

  • Strong communication skills and ability to work effectively in teams
  • Intellectually curious, with passion for learning and growing professionally
  • Must be able to provide off-hours support as needed for emergency situations.
  • Strong work ethic and proactive approach to problem solving
  • Enjoy having fun at work, and desire to collaborate with smart, humble people every day
Forward this Position
Recipient email address (one)
Your name
Your email address
Enter a message (optional)
Human Check*
Apply for this Position
* Required fields
First name*
Last name*
Email address*
Location
Phone number*
Resume*

Attach resume as .pdf, .doc, or .docx (limit 2MB) or paste resume

Paste your resume here or attach resume file

Cover Letter*
In 150 characters or fewer, tell us what makes you unique. Try to be creative and say something that will catch our eye!*